On Optimal Budget-Driven Scheduling Algorithms for MapReduce Jobs in the Heterogeneous Cloud

نویسندگان

  • Yang Wang
  • Wei Shi
چکیده

In this paper, we consider task-level scheduling algorithms with res-pect to budget and deadline constraints for a bag of MapReduce jobs on a set of provisioned heterogeneous (virtual) machines in cloud platforms. Heterogeneity is manifested in the ”pay-as-you-go” charging model we use, where service machines with different performance have different service rates. We organize the bag of jobs as a κ-stage workflow and achieve, for specific optimization goals, the following results. First, given a total monetary budget Bj for a particular stage j, we propose a greedy algorithm for distributing the budget, with minimal stage execution time as our goal. Based on the structure of this problem, we further prove the optimality of our algorithm in terms of the budget used and the execution time achieved. We then combine this algorithm with dynamic programming techniques to propose an optimal scheduling algorithm that obtains a minimum scheduling length in O(κB). The algorithm is efficient if the total budget B is polynomially bounded by the number of tasks in the MapReduce jobs, which is usually the case in practice. Second, we consider the dual of this optimization problem to minimize the cost when the (time) deadline of the computation D is fixed. We convert this problem into the standard multiplechoice knapsack problem via a parallel transformation. Our empirical studies verify the proposed optimal algorithms. Keywords-Heterogeneous Clouds, MapReduce optimization, optimal Hadoop scheduling algorithm, budget constraints

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Scheduling Algorithms for MapReduce Jobs in Heterogeneous Clouds with Budget Constraints

In this paper, we consider task-level scheduling algorithms with respect to budget constraints for a bag of MapReduce jobs on a set of provisioned heterogeneous (virtual) machines in cloud platforms. The heterogeneity is manifested in the popular ”pay-as-you-go” charging model where the service machines with different performance would have different service rates. We organize a bag of jobs as ...

متن کامل

A Cross-Jobs-Cross-Phases Map-Reduce Scheduling Algorithm in Heterogeneous Cloud

To fast process the large-scale data, map-reduce cloud is viewed as a very reasonable and effective platform. According to the new scheduling challenges in map-reduce cloud, a cross-jobs-cross-phases (CJCP) map-reduce scheduling algorithm is proposed in this paper. CJCP mainly consists of four optimal schemes, and respectively deals with four resource waste scenes of the job scheduling process....

متن کامل

Diagnosing Heterogeneous Hadoop Clusters

We present a data-driven approach for diagnosing performance issues in heterogeneous Hadoop clusters. Hadoop is a popular and extremely successful framework for horizontally scalable distributed computing over large data sets based on the MapReduce framework. In its current implementation, Hadoop assumes a homogeneous cluster of compute nodes. This assumption manifests in Hadoop’s scheduling al...

متن کامل

A Throughput Driven Task Scheduler for Batch Jobs in Shared MapReduce Environments

MapReduce is one of the most popular parallel data processing systems, and it has been widely used in many fields. As one of the most important techniques in MapReduce, task scheduling strategy is directly related to the system performance. However, in multi-user shared MapReduce environments, the existing task scheduling algorithms cannot provide high system throughput when processing batch jo...

متن کامل

Job Attentive Scheduling Algorithm in Hadoop

In recent years cloud services have gained much attention as a result of their availability, scalability, and low cost. One use of these services has been for the execution of scientific workflows as part of Big Data Analytics, which are employed in a diverse range of fields including astronomy, physics, seismology, and bioinformatics. There has been much research on heuristic scheduling algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013